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ABSTRACT 


We study a resource allocation problem in an intelligence setting. The intelligence cycle is 
comprised of three phases: collection, processing, and analysis. Enhanced efficiency within the 
first two stages directly impacts the number and types of important items that are considered by 
analysts, increasing the frequency of the most important documents that are reviewed. The 
dilemma here is that an analyst needs to quickly determine which sources to investigate in order to 
provide meaningful analysis to a request for information with a concrete deadline. Initially, the 
value of each source is unknown; so, too, is the probabilistic nature of the value derived from each 
item. Generally, more sources and documents are available to be considered within a limited time 
frame than could be ever analyzed, compounding the complexity of this problem. Our goal is to 
efficiently find the source that produces the largest fraction of relevant items with respect to a request 
for information. By "efficiently," we mean that the analyst balances exploration versus exploitation 
of the different sources judiciously. As such, the theoretical framework for this problem is that 
of a multi-armed bandit, a classic iterative decision learning process. This thesis presents a new 
approach to identifying the optimal arm(s) of a multi-arm bandit with the largest or smallest quantile 
or superquantile risk, under a loss constraint. This problem is not only important in intelligence 
applications, but in marketing and finance. We extend the existing theoretical framework of dealing 
with quantiles to a novel situation with estimators of conditional expectations over an unknown 
quantile. Two sequential elimination algorithms are developed that select the most important source 
for a given constraint level, sampling from the arm(s) with the largest conditional expectation over 
a quantile. 
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Executive Summary 


The intelligence cycle can be considered to consist of three broad phases: collection, processing, 
and analysis. As part of the second phase, the goal is to pass items that are important, in relation to 
a request for information, for processing by senior analysts. The objective is to create efficiencies 
within the processing phase, leading to a reduction in required analysts for a task, and better utilizing 
the total analyst resource. The dilemma here is that an analyst needs to quickly determine which 
sources to investigate, in order to provide meaningful analysis to a request for information with a 
finite and concrete deadline. To add further complexity, there generally exists more sources and 
documents than can ever be analyzed within the time frame given. Our goal in this thesis is to 
produce algorithms to efficiently discover the most relevant intelligence source(s) to analyze in 
order to have analysts spend less time processing data and more time to deliver critical insights. 
The essence of this work is a resource allocation problem within an intelligence setting and we 
derive the following organizational impacts as our primary motivation: 

1. a decrease in the total time that an analyst spends processing data, 

2. a decrease in the required number of analysts for a particular task, resulting in a reduction of 
resource allocation waste, 

3. an increase in the time that each analyst delivers insights from their analysis of intelligence 
information, 

4. an increase in the total intelligence product output of an agency or organization, and 

5. an increase in the tempo of an agency or organization: delivering more intelligence faster. 

Our theoretical framework for this problem is that of a stochastic multi-armed bandit, a classic 
iterative probabilistic decision learning problem. The goal of the multi-armed bandit is to determine 
the optimal trade-off between exploration and exploitation. The classic multi-armed bandit problem 
concept stems from observing gamblers within a casino playing slot machines—the term bandit 
stemming from the colloquial gambler term for a slot machine—the gambler must choose the 
number of times to play each machine as well as the order to play them. When a bandit is pulled, 
an immediate random reward is observed from an underlying probability distribution specific to 
that individual machine, and unknown to the gambler. The gambler’s objective in the game is to 
the maximize the cumulative reward over the number of plays. Within the stochastic setting, we 
note that for this problem each arm of the bandit can have a distinct probability distribution that 
determines the sequence of the rewards observed. 

In this thesis, we address a new approach to identifying the optimal arm(s) of a bandit with the 
largest or smallest quantile or superquantile risk, under constraints. This is analogous to a root 
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finding problem in a stochastic setting and is not only important in intelligence applications, but 
also within on-line marketing and quantitative finance. Quantile risk, more commonly known as 
value-at-risk, is one of the most systemic risk metrics within the financial engineering community. 
The superquantile risk is an improved metric known within quantitative finance community as 
conditional value-at-risk and is a coherent [1], regular [2], and convex [3] measure that seeks to 
model the distributional behaviour of risk, quantifying expected losses that may be seen within the 
tail [4], 

We extend the existing theoretical framework of dealing with quantiles as seen in [5] and [6], 
to a novel situation with estimators of conditional expectations over an unknown quantile. Two 
sequential elimination algorithms are developed that select the most important source for a given 
constraint level, sampling from the arm(s) with the largest conditional expectation over a quantile. 
In the aforementioned intelligence setting, this translates into efficiently determining the source 
that produces the largest fraction of items of a given quality on average; the idea being that each 
request for information has a particular quality stipulation. 
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CHAPTER 1: 
Introduction 


1.1 Introduction 

The purpose of this chapter is to provide an overview of the thesis, affording readers the opportunity 
to gain a contextual understanding. The thesis scope is given and we subsequently frame the 
problem by defining the operational motivation through two lenses, firstly with a direct military, 
and secondly a non-military application in marketing. We undertake a brief discussion of risk 
within the framework motivation presented, leading to the development of the research questions 
that follow. Chapter 1 is finalized by providing a snapshot of our contributions and the structure 
of the thesis that follows. 


1.2 Scope 

This thesis deals with intelligence analysis techniques and procedures in environments that change in 
real time. From the technical standpoint, we employ ideas from the machine learning and stochastic 
optimization communities of operations research. We develop a model, analyze, and numerically 
simulate its performance against constructed data. Decision support tools and performance metrics 
with live data is beyond the scope of this thesis, but can be easily implemented with the algorithms 
that appear in Chapter 3. 


1.3 Motivation 

Two primary settings have been considered as motivations for this research. The first application 
stems from intelligence operations and the second from the field of financial engineering risk 
management. 

1.3.1 Models of Intelligence Operations 

The intelligence cycle consists of five broad phases that link the direction of objectives, through 
collection, processing, and analysis, to outcomes for dissemination. Throughout the work pre¬ 
sented here, we consider three key stages of information transformation, consisting of collection, 
processing and analysis, as shown in Figure 1.1. As such, the intelligence process can be thought of 
as consisting of two stages prior to intelligence dissemination and integration [1]. This perspective 
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presented is juxtaposed to the intelligence cycle shown in Figure 1.1 briefly. The sub-processes of 
the three initial stages include: 

1. Collection: Raw intelligence items that are collected from a source and collated at an 
intelligence cell. 

2. Processing: The activities are undertaken from processors to manipulate raw intelligence 
items and identify those that may be suitable for analysts to invest further effort in, deriving 
meaningful outcomes from. 

3. Analysis: The professional analyst evaluates the importance of this information, and deliv¬ 
ers output product in a timely manner that provides a warfighting advantage and tangible 
outcomes. 

Through creating efficiencies within the first two stages of the intelligence process, we observe an 
improvement in the analysis phase where the majority of resourced effort is expended. This results 
in the most important items under consideration for a greater amount of time by the analysts. We 
can think of this process in terms of a signal processing analogy, where valuable intelligence can 
be considered the true signal and non-valuable intelligence the signal noise. Here, we attempt to 
remove as much of the noise from the signal as possible, whilst maximizing the time spent analyzing 
the true signal. The question for us is which source should an ancdyst explore in any given time 
period?, where the goal is to determine which intelligence source to sample from that yield the 
greatest value. The basic idea for the workflow of an intelligence request is summarized as seen in 
Figure 1.1 and Figure 1.2; our modelling of this problem framework consists of the following key 
stages: 

1. A requirement for specific intelligence is received with a finite deadline by an intelligence 
organization. 

2. A manager provides an analyst with the desired average importance level for an item, in 
accordance with organisational priorities. 

3. The analyst now decides which source to explore, and determines the generated item impor¬ 
tance in relation to the request for information. 

4. The item is passed on if its importance is over the threshold. A source is selected so the 
average importance of items with importance over the threshold equals the desired value of 
step 2. 

5. There generally exist more sources than can be feasibly explored in a given time frame, 
in order to obtain a relevant intelligence picture. The problem is that the analyst does not 
initially know which sources tend to produce a large fraction of items with importances over 
the threshold. 

6. Exploration vs. Exploitation. As the analyst conducts an assessment of those sources, an 
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understanding of the source(s) that tend to yield items over the importance threshold will be 
attained. From here, the analyst can focus on the most reliable source(s) to deliver important 
items, in relation to the request for information. 



Figure 1.1. The Transformation of Data From Information Through to In¬ 
telligence. Source: [1], 


The Intelligence Process 



Figure 1.2. The Cycle of Intelligence Production. Source: [1], 
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1.4 Quantifying Risk 

1.4.1 Quantile and Superquantile Risk 

Quantile risk, more commonly known as Value-at-Risk (VaR), is a prevalent risk metric within the 
financial engineering community. The superquantile risk is an improved measure of risk, known 
within quantitative finance community as Conditional Value-at-Risk (CVaR), that has superior 
mathematical properties—see [2], [3], and [4]—that seeks to model the behaviour of risk by 
quantifying losses that may be seen for extreme cases [5]. In practice, each risk measure has both 
advantages and disadvantages; some of these are depicted in Table 1.1. A more technical discussion 
is provided in Chapter 2. 


Table 1.1. Basic Comparison of VaR and CVaR. 


Case 

Value-at-Risk Conditional Value-at-Risk 

Less restrictive at the same confidence level 

X 

Useful when model tails are available 

X 

Useful when model tails are not available 

X 

Simple to optimize 

X 

Has mathematically superior properties 

X 

Risk adverse (conservative estimates) 

X 


This table indicates the usage of value-at-risk and conditional value-at-risk for 
various cases. Source: [6]. 


1.4.2 Risk Management and Marketing 

For situations where the maximum expected loss over a threshold is given, the agent in our 
scenario wishes to discover the bandit arm with the largest or smallest threshold that satisfies the 
desired level—the loss constraint in our technical problem—or the arm with the largest or smallest 
probability of exceeding the threshold that meets our constraint. These problems are natural in risk 
portfolio analysis, where the loss threshold is known as VaR, and the expected conditional loss over 
the worst lOOor percent scenarios is known as the CVaR at level a; for more information see [7], [3], 
and [8]. 

From an online marketing perspective, each arm corresponds to a marketing campaign for some 
product. The input is a number C that represents the average quality (e.g., a function of age, income, 
gender, etc.) of the individuals desired by the seller. The conditional value-at-risk is the fraction of 
people generated by the marketing campaign who have an average quality C and hence, the retailer 
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wishes to find the marketing campaign with the largest conditional value-at-risk. The quality of 
individuals generated by each marketing campaign is analogous to items generated by intelligence 
source. 


1.5 Research Questions 

We use the following guiding questions as a framework to navigate and unpack the body of work 
undertaken. 

1. Given k systems with an unknown distribution, we seek to find the system with the largest or 
smallest CVaR or VaR, with probability at least 1-6. How can this be achieved? 

2. What is the expected computational cost of solving the problem described above, and how 
does it depend on the problem parameters? 

3. How does the approach of question 1 compare with other existing methods? 

4. How can this CVaR or VaR selection framework fit as part of an intelligence source decision 
model? 

1.6 Contributions 

Stochastic root finding is concerned with the problem of finding the roots of a function f{x ) = 
EqF{ 6, x ); that is, the expectation of a function F with a random vector 6. The primary techniques 
used to achieve this are sample average approximation and stochastic approximation [9]. Our work 
is the first that deals with the so-called probably approximately correct framework in a stochastic 
root finding setting, of which the value-at-risk and conditional value-at-risk are two critical cases. 
Our proof technique is based upon a coupling argument that seeks to obtain the bounds required 
to implement a probably approximately correct algorithm. There exist two papers that deal with 
quantiles in a probably approximately correct framework [10] and [11]; however, we have not 
discovered any papers that study stochastic root finding within the probably approximately correct 
framework. 


1.7 Thesis Outline 

This thesis is organized into five primary chapters. Following this introductory chapter, Chapter 2 
discusses a background of the technical problem through the conduct of a literature review. Major 
topics in learning theory, as well a specific introduction to the multi-armed bandit, are given; a 
discussion of previous related work is also presented here. Chapter 3 depicts the mathematical 
algorithm derivations, as well linking the operational setting discussed in the preceding Chapter. 
Numerical analysis of the proposed algorithms is presented in Chapter 4, indicating the performance 
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of each algorithm in both high- and low-dimensional settings. The final chapter summarizes the 
totality of research that has been undertaken and looks toward the future in providing a defined 
way forward for further advancements in this research domain. For the non-technical reader, it is 
recommended that only the beginning of Chapter 3 be read and the remainder scanned. 
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CHAPTER 2: 

Background and Literature Review 


This chapter provides background on material relevant to the work presented in subsequent chap¬ 
ters. Beginning with a discussion on reinforcement learning, a broad link is made between a 
computational approach to learning and our problem specifically. We then investigate in depth 
the main problem for our research setting, the multi-armed bandit. We close with a discussion of 
contemporary literature that leads us to our contributions to the field, presented in Chapter 3. 


2.1 Background 

A number of approaches and techniques in use at present were born out of work from the last two 
centuries. Learning, from a computational perspective, is concerned with the actions taken by an 
agent in order to maximize a cumulative reward. Within computational machine learning there exist 
five key paradigms of learning: being supervised, unsupervised, online, active, and reinforcement. 

1. Supervised learning. There are two main categories of algorithms within the supervised 
learning paradigm, consisting of classification and regression. The algorithms use a classic 
known dataset method with which to train the algorithm in making predictions. When used 
on test datasets that have no known properties, or for which we do not know anything about 
their properties, the algorithms use their knowledge from the training set with which to make 
predictions about the test set. Supervised learning is commonly used in such applications 
as financial credit risk analysis, algorithm based trading strategies and classifiers, and email 
spam filters. Nominally, we can use supervised learning in situations where we require 
pattern recognition of the data to be undertaken [12]. 

2. Unsupervised learning. The family of techniques under the banner of unsupervised learning 
use unlabelled data with which to make inferences, gain insights, and find patterns. Cluster 
analysis is the most systemic unsupervised learning method and is used to find hidden patterns 
in such data. Unsupervised learning algorithms are commonly used in applications such as 
data pattern mining, computer vision object recognition, and natural language processing 
[ 12 ]. 

3 . Online learning. Within the online learning framework process, we attempt to answer a series 
of questions. In each iteration, we learn the answers to the previously posed questions without 
delay, this being the aforementioned online component, and is the distinguishing feature of 
this style of learning. Online systems see a systemic application in society. Such applications 
include recommender systems, where the Netflix recommender problem is a classic example 
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of this algorithm in use. Here, a user watches a film and provides immediate feedback to 
the system, feeding the algorithm and enabling further training of the recommender system 
as to what films the user may enjoy in the future. The notion of regret is introduced here as 
the difference between the system recommendations and the like or dislike of the user after 
viewing. We can measure the average success of the system in predicting a film for the user 
to watch and obtain a long-run appreciation of how it performs [13]. 

4. Active learning. As opposed to looking at the entire dataset, as was seen with supervised 
and unsupervised learning, the critical idea that distinguishes active learning is that it actively 
selects the training label subsets of the total dataset with which to select its data to learn from. 
Such problems arise from a framework where unlabelled data exists; however, there are further 
prohibitive reasons as to why the labels cannot be easily attained. Such reasons could include 
the cost of the labelled data, the time to manually to label the dataset or simply the labels 
are incomplete. The primary contrast to unsupervised learning is the ability to interactively 
query the user and obtain new data outputs. Active learning is commonly known as query 
learning or optimal experimental design in the machine learning literature, with applications 
in speech recognition, information extraction, and classification and filtering [14]. 

5. Reinforcement learning. Our final and most important computational learning concept 
(in the context of this thesis) is reinforcement learning. While it is commonly thought 
that reinforcement learning is a subset technique of unsupervised learning, this is not quite 
correct. Reinforcement learning is distinct as it tries to maximise a reward , such as in 
a Markov Decision Process, as opposed to the reliance on hidden structure, such as in 
unsupervised learning. The seminal problem in reinforcement learning is to maximise the 
reward of an agent, and as such leads us to the problem of exploration vs exploitation - a 
concept we will cover in detail. Within reinforcement learning the agent must exploit their 
current environmental knowledge to reveal a reward, whilst also exploring their surroundings 
in order to aid decision-making in the future. Pursuing a purely exploration or exploitation 
policy cannot be exclusively undertaken in the general setting and, as such, an efficient trade¬ 
off is required. Reinforcement learning considers the problem where an agent with specific 
goals interacts with an uncertain environment [15]. The paradigm of reinforcement learning 
is the setting we find ourselves in for the remainder of this thesis. 

It is important to define a limited number of critical terms for use throughout the remainder of this 

thesis. The terms defined below have been adapted from [15]. 

1. Learner (agent). The learner, or agent as is also commonly known, is the subject who 
takes actions based on inputs from the environment in an attempt to maximize their observed 
reward. 
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2. Policy. A policy is defined as the way in which the learner behaves at a given iteration (time) 
step, based on the history of rewards earned and actions taken to-date. 

3. Reward. In each iterative step, a (typically random) reward is received, which then influences 
the action taken in the next step. Earning rewards is the goal of the agent. 

4. Regret. The regret is the difference between the reward that could have been earned with 
more complete information (see below) and the reward earned from the policy implemented. 
A good policy is one with regret grows slowly. 

An important aspect of some learning problems is the trade-off seen between exploration versus 
exploitation. In a pure exploitation policy, the learner seeks to exploit the best of what is already 
known without considering alternative actions. When juxtaposed with a pure exploration policy, 
the learner attempts to take as many different actions as possible in order to make better selections 
in future iterations. While very specific exploitation-only and exploration-only algorithms exist, the 
overwhelming body of work that has been undertaken is in the development of hybrid algorithms to 
efficiently find this trade-off. The balance between an optimal action seen previously and exploring 
new actions at random iterations, according to a set policy, is the aim of these algorithms. 


2.2 Multi-armed Bandits 

The problem of multi-armed bandits first appeared in 1930s academic literature; however, it gained 
little traction in mathematical communities as it was thought that no closed form analytical solution 
to the problem existed. The introduction of the seminal paper by [16] set the framework for a 
reinvigoration of interest in the suite of now systemic multi-armed bandit problems. With the 
explosion of work to solve machine learning problems, the multi-armed bandit has seen consistent 
application towards this endeavor. 

The multi-armed bandit is an iterative probabilistic decision learning problem in which, with a 
choice of k arms of a bandit available to the player at each discrete time step, a reward is observed 
by the player. The aim here is to select an arm to maximize the cumulative reward seen over a finite 
time horizon, or alternatively minimize the regret from the optimal selection possible. 

For each time period t, an agent selects a single arm k t e \ ... K and receives the scalar reward X^ t , 
where K is the number of arms. In the base case of the multi-armed bandit problem, we consider 
problems for which the reward X^ t is maximized. As derived in [17], the regret from the optimal 
selection after n rounds is defined as 

n n 

Rn= max V Xu - V X la , (2.1) 

t =1 t =1 
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where l t is the selected arm at time t, with an associated reward Xi ft . While a general notion of the 
concept of regret has been given, no formal definition has been provided. We define two key forms 
of regret, namely expected and pseudo-regret. 


1. Expected Regret. The expected regret is the expected difference observed by an agent, with 
reference to an optimal action for the sequence of realized rewards [17]. 


E[R n \ = E 


n 


max 


!>•> 



( 2 . 2 ) 


2. Pseudo-Regret. The pseudo-regret is a weaker form of regret, as an agent competes only 
against an optimal action, in expectation [17]. 


Rn 


r n 


max 



Xu - 



(2.3) 


2.2.1 The Stochastic Multi-armed Bandit 

The stochastic multi-armed bandit was initially presented by [16], introducing the technique to 
analyze upper confidence bounds for regret. The generalized form of the stochastic multi-armed 
bandit is defined in Algorithm 1; however, it should be noted that the underlying distribution of 
each arm does not change in each iteration. The reward observed in each time period is a random 
sample drawn from that arm’s distribution [17]. 

Algorithm 1 The stochastic bandit problem 
1: Known parameters: number of arms K and (possibly) number of rounds n > K. 

2: Unknown parameters: K probability distributions v \,..., Vk on [0,1] 

3: For each round t = 1,2,... 

1. the forecaster chooses I t e {1,..., K}; 

2. given I t , the environment draws the reward X/ i t ~ vj t independently from the past and 
reveals it to the forecaster. 


The primary metric of interest for this family of algorithms is the pseudo-regret. The pseudo-regret 
of a stochastic multi-armed bandit is defined as a special form of the general pseudo-regret, given 
in Equation 2.3 as 


Rn = np 


n 


/= 1 


(2.4) 
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where //* is defined as max and ///, defined as the mean of arm /,. An underpinning property of 
the stochastic multi-armed bandit is that it can be proven that a logarithmic upper bound determines 
the rate of convergence observed, given as (9(log n), that cannot be improved upon. Equation 2.5 
also can be expressed as 


K 

R n = Yj^ ET i^ (2-5) 

i=l 

where Tj(n) is the number of pulls of arm i by time n, and A; = // - / y ,, Vi 6 K [17]. 

2.2.2 Time Horizons 

It is important to make a distinction between the infinite and finite time horizon cases. In the finite 
time case, we seek to select the optimal arm with a probability of at least 1-8, for a sufficiently 
small 8 6 [0,1]. Alternative to this is the infinite time scenario, which is not considered in this 
thesis because time horizons are very much finite in the intelligence setting. 

2.2.3 Key Algorithms 

A number of foundational algorithms are critical to framing our work presented herein. These form 
the basis for the main body of research leading to our work presented in the subsequent chapters. 

First, we consider the multi-armed bandit problem within the context of a probably approximately 
correct model. The first work is [ 18], who provide an algorithm to find the arm with largest expected 
reward with probability at least 1-8, where 8 € (0,1) is a parameter selected by the agent. The 
successive elimination algorithm sequentially samples from the remaining candidate arms in each 
iteration, returning an observation and recalculating all summary values. At each time period, if 
an arm’s empirical mean is sufficiently small, then it is removed from further consideration, thus 
reducing the feasible set of arms by one. For arms with distributions supported over [—b, b ], for 
b > 0, [18] shows that the expected number of observations until the algorithm terminates is of the 
order 



for a total of K arms, when the goal is to find an optimal arm with probability of at least 1-8. The 
authors show that such computational complexity is the lowest possible, up to the leading order. 
The lower bound on value based probably approximately correct bandit sample complexity was 
studied in detail within [19]. 

The algorithm derived by [20] appears below. The parameter a n is the elimination threshold at 
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stage n, and depends on the arms distributions. For arms with support over [- b, b\. for b > 0, it is 


Q?n ~ 



/ Kn 2 n 2 \ 


Algorithm 2 Successive elimination algorithm 
1: Set n = 1 and S = {1,2, K}. 

2: Set for each arm i, X\(i) = 0; 

3: Repeat 

• Sample every arm i e S once and let X n (i) be the average reward of arm i by trials or 
pulls n\ 

• Let X n (max) = ma x i£S X n (i); 

- For each arm i e S such that X n (max ) - X n (i) > 2 a n do 

* set S = S - {i}; 

- end 

• n = n + 1; 

4: Until \S\ > 1; 


Next, we discuss a sequential elimination algorithm closer to the focal problem of this thesis. The 
qualitative probably approximately correct QPAC algorithm (Algorithm 3) is an iterative adaptive 
elimination algorithm that probabilistically removes arms from consideration, based on the tests at 
lines 9 and 11 in the algorithm. This algorithm aims to select the arm with largest r quantile, up to 
a resolution of € (so-called (6, r)-optimal) with probability at least 1 - 6. The expected number of 
required samples required to determine the (e, r)-optimal arm with a probability of at least 1 - 6 is 
of order 

0 (|j( 6 VAp 2 l0g (evApS-d)’ 

which is similar to that of Algorithm of 2. Thus, QPAC is shown to be optimal up to a logarithmic 
factor for the sample complexity. 

Algorithm 3 relies on the empirical quantile 


Qniij) = inf{* e R : r < F* k (x)}, 


/V V 

where F m k (x), the empirical distribution of the rewards from arm k after m samples. The math¬ 
ematical operator < indicates the totally ordered set with which the algorithm operates over, and 
c t is an evaluated function value that depends on the iteration t (or alias sample m), given as an 
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auxiliary function that determines the elimination confidence interval size, defined in Equation 2.6. 
The parameters and xj are the thresholds for elimination that take the place of a m in Algorithm 
2, as follows: 


1. if the value of the arm is less than x~, it is removed from further consideration, 

2. if the value of the arm is greater than x*, it is selected as the optimal arm, exiting the 
algorithm, 

3. if the value of the arm lies between x~ and x+, it remains under consideration. 

The parameters xf and x~ depend on constants defined as 





n 2 m 2 
3d ' 


( 2 . 6 ) 


This work leads us to Algorithm 3, in which we can note the underlying classic bandit framework 
described in Algorithm 1. For each iteration, a sample X^ t is drawn from every candidate arm 
in the set, followed by an update of the values and x~, continuing until the candidate set is a 
singleton; we substitute m samples for t time-steps in the algorithm notation as we draw exactly 
one sample in each time step. 


Algorithm 3 QPAC(d, e, r) 

1 

Set 3\ = 1 

> Active arms 

2 

t = 1 


3 

while J\. ± 0 do 


4 

for k 6 do 


5 

Pull arm k and observe X^ t 


6 

x~ - maxA-eTi Q? k (r ~ c t (j)) 


7 

x+ = max^ Q? k (T + c,(j)) 


8 

for k e do 


9 

if Q? k (r + c t ij)) < x; then 


10 

Jl =Jl\{k} 

> Discard k based on line 6 

11 

if < Q? k (T + c t (j)) then 


12 

k - k 

> Select k according to line 7 

13 

BREAK 


14 

+ 

II 

• 4 -*. 


15 

return k 
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2.3 Quantile and Superquantile Risk 

The superquantile risk is a metric known within quantitative finance community as conditional 
value-at-risk, that quantifies the expected losses over a probabilistic threshold [3]. When a pdf 
exists, the superquantile is simply given as the conditional expectation above a given quantile 
threshold, stated as E[X\X > q a \, where X is the random variable corresponding to portfolio loss 
and q a is the quantile threshold at the desired level of risk averseness a. That is, the conditional 
value-at-risk is the expected loss when the losses fall in the worst 1 - a percentile. When a = 0, 
there is an assumed agnosticism to risk, whereas when a = 1, there is a complete averseness 
towards risk. Quantile risk, more commonly known as value-at-risk, is used as a systemic risk 
metric within the financial engineering community, and is given as the a quantile of the portfolio 
loss X [21]. The interpretation is that portfolio X has probability 1 - a of incurring a loss of at 
least q a . Figure 2.1, taken from [6], further illustrates the concepts described. 



Figure 2.1. Illustration of Value-at-Risk and Conditional Value-at-Risk of 
the pdf of a Random Variable Y. Source: [6]. 
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CHAPTER 3: 
Computational Methods 


3.1 Introduction 

Following our background overview in the preceding chapter, we now turn to the model formulation 
and analysis. 

Each arm represents an intelligence source, that produces an intelligence item per time period. The 
importance of an item is the value that is generated by sampling a source with respect to a specific 
request for information. The importance generated from a source at each time step is a distinct 
document observation, which we consider as a random variable, with the random variables being 
independent and identically distributed for each arm. 

Recall from the last chapter our motivation: a request for information necessitates a certain average 
importance value—this being the conditional expectation superquantile—for the documents that 
are passed on to a senior analyst, meaning that a good source is one that has a large probability 
of producing such documents. More specifically, for each source, we set its conditional expected 
importance over a threshold (initially unknown) equal to the average document importance required 
(an input), and then seek to sample from the source whose quantile at the unknown threshold is 
smallest. Our measure of performance is the expected regret when compared to selecting the 
optimal source at each time step. We define the regret as the difference in quantiles between 
the best source and the suboptimal sources. In order to achieve this, our goal here is to produce 
algorithms with good regret convergence, which in our case means logarithmic in the number of 
documents explored. 

Contained within Table 3.1 is a summary of the indexes, sets and variables mentioned within the 
description above. 
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Table 3.1. Parameters of the Model. 


Constant Index 

Set 

Variable 

Description 

5 G 

5 


Intelligence source s is a single element of the set of all 

sources S 

t G 

T 

Xsj 

t is the current time period in the finite time horizon T 
An observation from source s in period t 

a s G 

(0,1) 


The o'-quantilc of source 5 

ks,a € 

(< a,C ) 


The importance threshold of source s at quantile a. This 
value is found by the VaR algorithm 

C G 

(a, b ) 


The desired intelligence request value 


Overarching indexes, sets and variables used to describe our model of intelligence 
operations. 


3.2 Selecting the Largest Quantile Risk Level 

In this section, we present the general model. We study the problem faced by an analyst who has a 
constraint on the superquantile risk for a set of candidate intelligence sources, meaning that 

E[X S \X S > k s , a J = C, (3.1) 

where X s is a random regret associated with an intelligence source s, and k S 0 , s is the quantile risk 
at level a s ; C G R is the input constraint. 

The constant C (a model input) is the average importance of the intelligence for the documents that 
are passed to a senior analyst. 1 - a s (C ) is the fraction of documents generated by source 5 that 
have a conditional expected importance of C, with quality at least k s (C ) (unknown). For a given 
C, the goal is to find the intelligence source that produces the largest fraction of items that meet 
the intelligence quality level and hence, the analyst wishes to find the source with lowest a s (C). In 
this case, a sample X st is the quality of source S generated in time step t. Later in this section we 
impose the condition that E\X S ] < C, corresponding to the average quality of a source generated 
by source 5 being less than C; otherwise a s (C) is 0, meaning that all items are passed for further 
analysis (and there is no problem to solve). As C is a qualitative constant, the calculation of means, 
medians and variance on an ordinal set of observations is an invalid measure and cannot produce 
meaningful outcomes [10]. Critically, we want to solve for cr^C), which requires first solving for 
the value of k SM , which is the root problem. There are two important cases to consider: 
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1. A high C. Relative to the interval ( a , b ), a high C indicates an important intelligence request. 
Here, we wish to find the source(s) that generate items of high average importance. In this 
scenario, it is likely that each source generates relatively few items over the threshold k(C) 
so that finding the source for which P(item importance > k(C)) is largest is realistic. The 
latter is akin to finding the source with the highest fr v (C); see the definition 3.14. 

2. A low C. Relative to the interval (a, b ), a low C indicates that this is a less important 
intelligence request. This is the converse of the preceding scenario. Here the flow of 
items is likely to be large, and the analyst is interested in finding the source for which 
P (item importance > k(C)) is smallest. Appealing to (3.14), this means that a s (C ) is 
smallest. 

The solution of the root problem also is called the buffered Probability of Exceedance (bPOE) [22], 
defined as the inverse of a conditional value-at-risk level and is a generalization of the buffered 
Probability of Failure (bPOF), defined in [23] as one minus the inverse at point zero of the 
superquantile or conditional value-at-risk level. 

We consider a framework with two cases. First, the analyst wishes to find the intelligence source 
with largest or smallest threshold quantile (known as the value-at-risk or quantile at level a), given 
as the root of the equation k sa? in Equation 3.1. Second, the analyst’s objective is to identify the 
intelligence source with largest or smallest probability a of exceeding the quantile risk, where the 
root k S j, s is set to satisfy Equation 3.1. In the former case, the level a s plays no role in finding the 
quantile risk that satisfies the superquantile risk constraint, while in the latter a s can be obtained 
from the root k St a s . 

Without loss of generality, we work with the problem of finding the source(s) with the largest root 
k, as well as the one with largest superquantile risk level a. The problem of finding the arm with 
the smallest superquantile risk level or root is solved by utilising our multi-armed bandit model 
arms driven by the negative random variables —X s . Note here that the problem of finding the root 
of C' - E\X' S \X' S < k ] with E\X' S \ > C' and P(X' S < C') > 0 is identical to the situation considered 
here by allowing X = -X' and C = -C'. 

More formally, we consider a finite set of candidate arms S = {1,..., 5}. For each arm s e S 
there is a stochastic observation, defined by a random variable X s . For the purposes of our model, 
we assume that X s has a continuous distribution, and thus a density, for each arm s e S. The 
analyst observes independent and identically distributed (iid) samples X s y,X s ^, ..., X sn from a 
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distribution with a density fx s (-), and where k s (C) is the root of 


C = E[X S \X S > k]. (3.2) 

The goal is to find the arm s* e S with the largest root, k s *(C ) = max s k s (C). 

Three key assumptions are made going forward for the remainder of this work, which are: 

Al. C - E[X S ] > y > 0, Vs e S. 

A2. The random variables X s ^,X St 2 ,. ■ ■ , X s n have bounded support over ( a,b ), is e S, with 
—oo < a < C < b < oo. 

A3. The random variables X s , for all s e S, have a probability density function fxjs) that is 
uniformly bounded below by £, governed by the constraint C > 0. 

Assumptions Al and A2 ensure that the root k s (C ) is well defined, whilst Assumption A3 is used to 
bound the error probability of the root estimator. In quantile estimation settings, a positive density 
is required in the neighborhood of the quantile to control the estimation error; in the superquantile 
risk setting, this assumption is further extended to the entire support [24]. 

For the remainder of this work, we drop the arm index (source) s, unless required to depict a specific 
scenario for distinct arms. We turn to the work of [18], where our idea is to adapt a sequential 
elimination approach for which one needs to show that for 0 < 6 < 1 there exists e n > 0 such that 

P(\k n -k(C)\ > € n ) < (3.3) 

where k n is the root estimator using n iid samples. The analysis is further simplified by exclusively 
dealing with the root of the function 

g{k) = E[(X - C)I{X > k)l (3.4) 

which is the result of a simplification of 

E[X-X > k] 

C = E[X\X >k]= 1 ’ - <=> g(k) = 0. (3.5) 

Assumptions Al and A2 provide guarantees that lim^-co g(k) = ii[X] - C < 0, and g(-) increases 
to attain its maximum at k = C, with g{C) = E[(X - C)I(X > C)] > 0. After attaining this 
maximum point, g(k) monotonically decreases towards 0, as k approaches b. It follows that there 
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is only one root k(C ) < C that solves g(k) = 0. Of note, a consequence of Assumption A3 is that 

C-k(C)>if/> 0 (3.6) 

for some if/ > 0; see Lemma 1 in Appendix A. 1 for our proof of this. Intuitively, the error probability 
for the root estimation grows as C approaches k{C) for a given sample size n; an illustration of the 
function g(-) is shown in Figure 3.1 that depicts this described function. 



Figure 3.1. g(-) fora Truncated Normal Distribution (/u = 15, cr = 30), Over 
the Interval (-100,100) with C = 25. 


At each iteration of our sequential elimination algorithms, we draw iid samples X \,..., X n , where 
the root is estimated by solving 


- V (-C) / (Xj >k) = o, 

n /—f 


(3.7) 


and the left-hand side of Equation 3.7 is interpreted as an empirical g(-) function. Moreover, we 
let the estimated root to be given by 


k n = inf 


1 

| k > a : - Yj ( Xi ~ C) 1 >k ^ 

V 1=1 


(3.8) 


There exist three cases, which are 

1. (V n) Yj < Cand(l/n) Y 7 (X t > C) > 0,in which case monotonicity of (V'O Y (%i - C) I (X,- > k ) 

i=i i=i ;=i 

in k ensures that there is a unique root in ( a , C). 

n 

2. (V'0 Y Xj > C, leading to k n = a. 

i= 1 
n 

3. (V'O Y 7 ( Xj > C) = 0, in which case k„ = max, = | „ Xj < C. 

i =1 

These assumptions imply that the probability of Cases 2 or 3 decay to zero exponentially in n. 
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In Case 1, given the ordered samples X(\ } < Xq) < ■ ■ ■ < X (n) , root finding can be equivalently 
implemented as k n = X( nr ) , where 

m* = min I m > 1 : ^ (X (i) - C) > 01. (3.9) 

V i—m ) 

The average complexity of sorting the samples and finding X( m > ) is of order O (n log n), as given 
in [25]. 


3.2.1 Algorithm 

We let A, = k s * (C)-k s (C) > 0 for all arms s 4- s*. Algorithm 4, the sequential quantile elimination 
algorithm shown below, initializes each root k s , n to a , and utilizes the threshold 





^■ 2 / 7 2 5 \ 
36 ) 


i \ t/ 2 / 

1 \ b - a 

2 n ) W ’ 


for n = 1,2,..., N, 


(3.10) 


to eliminate non-optimal multi-armed bandit arms (recall Equation 3.6 for the definition of </o) 
Theorem 1 shows that the root estimation error | k s n - k s (C) | is larger than e n , with a probability of 
6. Algorithm 4 is a standard implementation of the sequential elimination algorithm of [18], with 
modifications as shown. 


Theorem 1 Under Assumptions Al, A2, and A3, 

P(\k s , n ~ k s (C)\ < e n ,Vn,Vs = >1-6. (3.11) 


Algorithm 4 Sequential Quantile Elimination Algorithm (C, cr, a, b, S, 6) 
Setyi = {1,...,S}. 

Set = a,Vs e yi 
while \$l\ > 1 do 
for arm s e 3d l do 

Draw one sample from arm 5 and compute k s n 
if kmax,n = tnaxy & 3 {{k s ' n } — k S j X > 26,, then 
{5} 

Set n = n + 1 


Since P(\k sn - k s (C)\ < e n ) > 1 - 6, Algorithm 4 probabilistically selects the best multi-armed 
bandit arm with a probability of at least 1-6, shown in [18]. Additionally, the work presented 
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in [20] proves that the expected number of samples EfA^] generated by a non-optimal arm s s* 
is given as 

oo 

E[N S ] < ^ P(k max, n ks,n < 2 e n ) 

n= 1 
oo 

— ^ ' l P(ks*,n ~~ k s n < 2e„) (3.12) 

n=\ 

oo 

— U s + ^ ^ P(ks*,n ~ ks,n ^ 26,;)? 

n=M s + l 


for = inf{« : 4e„ < A. ? }. It easily follows that E\ N s ] < u s + 2d/5, concluding that the expected 
number of required samples for the non-optimal arms, X E[N S ], is at most 26+ X Mj- By solving 
for for n > e such that 4e n = A v , with e n as in Equation 3.10, leads us to the dominant term in 
Yj ET/Vy], which is 

sj=s* 


8 (b-a) 2 

W 2 



(3.13) 


for any choice of 6, given 6 is small; see [20] for rigorous proofs of this. As a result of these 
constraints, when relative to the more traditional problem of finding a bandit arm with the largest 
expected value, finding a bandit arm with the largest root increases the expected number of required 
observations by a factor of V(f<A) 2 - 


3.3 Selecting the Largest Superquantile Risk Level 

We now return to our initial problem scenario of finding the source with the superquantile risk 
level. Moreover, for E v (•) the distribution function of X s , we let 


a s (C) = F s (k s (C)), 


(3.14) 


where k s (C ) is the root of 


E[X S \X S >k]=C. 


(3.15) 


The analyst’s goal here is to find the arm 5* with largest a S {C). The empirical estimator of a s (C) 
is a s ^ n , defined as 


a. 


1 

- V I(Xi < k n \ 


n 

1 = 1 


(3.16) 


where k n is as defined in Equation 3.8 for arm s. It follows from Equation 3.9 that a s<n = when 

n n 

(V'O Y Xi < C and Q/n) Y > C) > 0, where m* is as in Equation 3.9. Hence, this problem 

/=1 i=l 

is computationally not any costlier than that of finding the root k n . 
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Toward the goal of deriving a sequential elimination algorithm, we define the threshold 


€fl 



(T) 



x max 


b - a 2 (b — a) + (b — C)/n) 

W 


(3.17) 


for n = 1, 2,..., N. The maximand for Equation 3.17 arises as a result of coupling the empirical 
distribution to the empirical g(-) function (cf., Equation 3.7); see the proof of Theorem 2 given at 
Appendix A.2. 


Intuitively, the difference between the true and empirical superquantile risk levels is large if at least 
one of three of the following events occur 

1. the true root estimator significantly deviates from the true root, 

2. the empirical g(-) function at the true root k(C) significantly deviates from g(k(C)), or 

3. the empirical distribution significantly deviates from the true distribution, at the root k(C). 


As with Section 3.2, we assume A iS = a s *(C ) - a s (C) > 0 for all arms s ± s*. Algorithm 5 utili z es 
the thresholds in Equation 3.17 to eliminate non-optimal arms. Theorem 2, whose proof appears 
in Appendix A.2, proves the key step for our algorithm to function as prescribed. 


Theorem 2 Under Assumptions Al, A2, and A3, for e n as in (3.17) we obtain that 

P(\a s , n -a s (C)\ < e„,Vn,Vs = 1,..., S) > 1 - 6. (3.18) 


3.3.1 Algorithm 

As a result of Theorem 2, we now present the sequential elimination algorithm for a superquantile 
risk level selection. 

Algorithm 5 Sequential Superquantile Risk Level Elimination Algorithm (C, cr, a, b, S, 6) 

Set# = {1,...,S}. 

Set a SM = 0, ds G # 

while |#| > 1 do 
for arm s e do 

Sample from arm 5 and compute a SM 
if ma x S ' € w{a s >, n } - a s ,„ > 2e n then 
# = # \ {s} 

Set n = n + 1 
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As with Algorithm 4, Theorem 2 implies that the arm with largest superquantile risk level a is 
chosen with a probability of at least 1 —8. Furthermore, the expected number of total samples 
Y T’[iV v ] observed by the non-optimal bandit arms is given to us by 

s+s* 

J] E [N S \<2 6 + J] u s (3.19) 

sj=s* sj=s* 

for u s = inf{» > e : 4e n < A v }. For small 6 > 0 and by standard arguments, we see that the 
dominant term in Y E[N S ] is 

s^s* 


32 max 


a 2 (b - a) 


v ' s+s 


(3.20) 


where the impact of the ( b- C)/n term showing in Equation 3.17 is of an order that is smaller than 
that of log( 1 /<5). 
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CHAPTER 4: 
Numerical Examples 


Following the derivations provided in the preceding chapter, presented here are numerical examples 
for three primary distributions: the truncated normal, triangular, and uniform. We begin with a 
validation of our derivations and show that we indeed do elicit numerically accurate estimates for 
our root function </(•). Following this, we juxtapose each algorithm against each distribution, with 
a brief introduction to the parameters selected for the remainder of this chapter. At the conclusion 
of these numerical examples, implementations are presented for extended length trials, specifically 
investigation of long-run and high sample trials. In combining these concepts together, a high 
dimensional data section has been included that looks at the scalability of these algorithms up to a 
10 8 x 10 2 matrix. We finish by providing an analysis of e and investigate the convergence of this 
threshold, as well its effect on the rate of elimination for each algorithm. 

4.1 Implementation 

To contrast the effect that different distributions have on the rate of convergence, the input parameters 
for each algorithm remained constant throughout each implementation of both the quantile and 
superquantile elimination algorithms within this chapter and defined in Chapter 3. These parameters 
we selected so as to illustrate the effect of convergence in a sufficient number of iterations. The 
inputs to both the quantile and superquantile elimination algorithm are identical, with the mean of 
each arm, jj. calculated based upon the interval of the underlying distribution, [a, b\. Here, all arm 
means are set to be S equally linearly spaced values, over the interval [ a , b], for all distributions. 
The number of new observations considered in each iteration, n, has been set sufficiently large in 
order to ensure that a timely convergence occurs. This represents the consideration of multiple 
source items at each iteration, as opposed to selecting only a single item. While this assumption 
may not hold in all instances for an on-line implementation, it provides us with the numerical 
convergence properties we seek to demonstrate here. Note that in Table 4.1 that no information 
on the distribution of each arm is given. For the numerical examples presented in later sections, 
we consider each arm as having the same underlying distribution, however, with distinct // in order 
to observe convergence. Note that the standard deviation, cr remains constant for each arm of the 
bandit where n varies. We have not mixed distributions for arms. 

We note that the depiction of both the quantile and superquantile elimination algorithms in Chapter 
3 force the algorithm to continue until convergence has been achieved. However, for the purposes 
of practical implementation, we provide an upper bound on the number of allowed iterations, max 
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iterations. This parameter is a practical bound that affords us the opportunity to exit the algorithm 
and observe the rate of elimination at distinct iterations of the algorithm. Note that if the algorithm 
fully converges prior to the attainment of this bound, the standard stopping criteria will execute as 
given in Chapter 3. 

At the beginning of the algorithm, as shown in Appendix A.3 and A.4, the parameters C, cr, a, b, S, 
and 6 are used to calculate the mean of each arm (//), as well to construct the data structure with 
to operate on. We pull n observations from each arm, drawn as random samples of each arm’s 
underlying probability distribution, calculate our elimination criteria, and determine for each arm 
if it is eliminated or selected as optimal. If neither of these cases is met, the arm remains in 
consideration until the next iteration. This road map for the execution of the algorithm is identical 
for all arms within the system. 

Seen within Appendix A.3 and A.4 are the outputs listings from these algorithms, updated at every 
iteration. The result matrix is a series of S vectors that contain the empirical estimate of our 
root function, g {-); this is value of each arm that is shown with Figures 4.1 to 4.4. epsilon is a 
vector of the values of e and was the data used to depict Figures 4.11 and 4.12. verbose_arms 
is a vector that tracks the status of each arm and indicates which arms are currently active or that 
have been eliminated. The calculated means /u —discussed above—are contained within the vector 
mu and is a vector of S linearly spaced values used for each arm, throughout. The vector root_max 
tracks the best-observed arm in each iteration. Within the limit, this vector will depict the optimal 
arm continuously, whereas a number of arms are selected initially as the primary metric, result, 
stabilizes. The final parameter recorded is a vector of the remaining number of expected values 
required for convergence, expected_samples. 

The parameters are selected in order to provide a visually striking difference for each distribution 
presented in this chapter are 
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Table 4.1. Parameters Used for Numerical Examples. 


Parameter 

Value 

C 

25 

CT 

30 

a 

-100 

b 

100 

S 

25 

8 

0.1 

n 

10 4 

max iterations 

500 


Where these parameters have not been used, this is specifically stated. These 
parameters represent our base case scenario for evaluation; max iterations is the 
algorithm stopping criterion. 


4.1.1 Code Development 

Initially, each component of the algorithm was individually implemented and tested, where knowing 
the theoretical results for the truncated normal case helped us to verify the code. Program builds 
first occurred within the R-3.3.2 environment, providing a good foundation for rapid development. 
Subsequently, a migration to the MATLAB-R2016b and then MATLAB-R2017a platforms was made, 
occurring for two primary reasons 

1. the ability to incorporate specific library files, and 

2. for implementation on Hamming. The Naval Postgraduate School has a high-performance 
computing system, in the form of a hybrid cluster supercomputer; this is the Hamming system. 
All numerical execution was conducted on this architecture. 

Upon obtaining numerically stable implementations, the modular system was discarded for a 
streamlined single-function script, reducing node communication requirements between modules. 
The final version of the implemented code for both the VaR and CVaR sequential elimination 
algorithms can be seen in Appendix A.3 and A.4. 

4.1.2 Truncated Normal Distribution 

During scoping of this topic for research, we had intended to implement both algorithms in the case 
of unbounded support through the use of distributions such as the classic Normal. This concept 
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was refined early on to only consider the case of bounded distributions and as a result, we use 
the truncated normal as our base case scenario. Importantly, this still allows for the derivation of 
closed-form solutions to be undertaken and ensure that we do not become overly dependent on only 
the numerical implementation of our measurement and assessment. 

The truncated normal offers the lowest rate of arm elimination for our algorithm. When we compare 
Figures 4.1 and 4.2 with those in each column (Figures 4.3 to 4.6), it is evident that there is a greater 
number of remaining arms at the termination of the algorithm than for both the triangular and 
uniform cases; this will be discussed in detail in later sections. At this resolution, not one bandit 
arm has been eliminated thus far. In a dense setting such as the one provided, it is not possible to 
see this non-elimination, thus far. In order to note the proportion of eliminated arms, we interrogate 
one of our output vectors which track the elimination of arms. While we observe convergence to 
the mean of each arm—when compared to theoretical results derived in Chapter 3—a far greater 
number of iterations is required to reach full algorithmic convergence and identify the correct arm 
with a probability of 1 - 8. This happens because the £ value is very small (see below), meaning 
that the thresholds e n (of order 1 /C) are large. The numerical examples suggest that the thresholds 
e n as presented in the previous chapter are too conservative. Figure 4.1 is a depiction of the 
standard setting for 500 iterations, with an underlying truncated normal distribution. The value 
of the y- axis is the quantile associated with the threshold C. When juxtaposed to Figure 4.2, we 
note that while the elimination example and convergence properties are similar, the y- axis has a 
strikingly different scale. In this instance, we are dealing with the basic quantile setting and as 
such, the y- axis represents the raw value, X s j. Each line in Figures 4.1 to 4.6 indicates a distinct 
arm (or source) which is under consideration, as given in the model description in Chapter 2. The 
value of each arm in every iteration is the solution to our empirical root equation g(-), described in 
the previous chapter. 



CVaR Elimination for the Truncated Normal. 
Figure 4.1. Implementation of Algorithm 5. 



Algorithm Iteration (count) 

VaR Elimination for the Truncated Normal. 
Figure 4.2. Implementation of Algorithm 4. 
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Regarding the derivation of the g(-) function for the truncated normal, we proceed as follows. We 
solve for k in 

0 = E[(X - C)I(X > k )] = E[XI(X > k )] - CP(X > k ). (4.1) 


The CDF of X is distributed as a truncated normal between a and b, with mean p e ( a , b ) and 
variance cr 2 is 


P(X < k) = 


®(*=H) - O(^) 
0(^£)_0(4-H) 


for k between a and b, (p and ®, are the pdf and CDF of a standard normal distribution (N((), 1)), 
respectively. Hence, 


P(X >k) = 


0(^4)-O(^) 0(^4) -0(^4) 

®(^z£) _ 0(£-_H) ® 


The pdf is given as 

1 0(^) 

cr _ 0(£zH)’ 

and the £ values used in the algorithm are capped at 

1 min{0(^),0(^)} 
CT $(&=£)_ <D(£zif) 

For the second term in Equation 4.1 


E[XI(X > k )] 


= / 


xf(x)dx 


1J^0(g) 


= 0-0 


A* 


cr 


0 


A* 


cr 


+ // O 


A* 


cr 


o 



From here, 0 = .E[X/(X > A:)] - CP(X > k ), which leads to 


0 = cr f 


k - p 


cr 


-0 


b - p 


cr 


+ p ® 


b - p 


cr 


-® 


k - p 


cr 


——^-) -®(-——)), (4.2) 


cr 


cr 


which is solved numerically (in MATLAB) for the value of k, which we have defined as the function 
g(-) within the Chapter 3. 
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4.1.3 Triangular Distribution 

Our second example is that of a modified triangular distribution, with parameters used as given 
in Table 4.1. To ensure numerical stability at the end points of the algorithm when eliminating 
arms from contention as optimal, the parameters if/ and £ require strictly positive values. This 
requires a modified triangular distribution that sits on top of a uniform distribution to ensure we 
do not enter the case of i/r < 0 or £ <0. The triangular distribution represents the intermediate 
case of convergence for our algorithms. This is to be expected, as £ is constant for the uniform 
case and too small in the truncated normal scenario. We note that both algorithms very quickly 
eliminate arms from consideration. While the Algorithm 5 has not yet converged at 500 iterations, 
convergence was observed for Algorithm 4 in just over half of the maximum number of allowed 
iterations. Figure 4.3 is a depiction of the standard setting (given in Table 4.1) for 500 iterations, 
with an underlying triangular distribution. As with the previous section, the value of the //-axis is 
the quantile associated with the threshold C. When juxtaposed to Figure 4.4, we note that while the 
elimination example and convergence properties are similar, however, the //-axis has a strikingly 
different scale. In this instance, we are dealing with the basic quantile setting and as such, the 
y -axis represents the raw value, X s j 



a! 

g § 
= § 

a t 
■£ § 



50 100 150 200 250 300 

Algorithm Iteration (count) 


Algorithm Iteration (count) 

CVaR Elimination for the Triangular Distribution. VaR Elimination for the Triangular Distribution. 

Figure 4.3. Implementation of Algorithm 5. Figure 4.4. Implementation of Algorithm 4. 


4.1.4 Uniform Distribution 

The final distribution under consideration is the uniform. This example represents the best-case 
convergence of all distributions used to illustrate each algorithm. The bounds for each uniform are 
given by 

I b - a b — a\ 

I bs --—, y s -I--— I, V b > a, 
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where jj s is obtained from the input parameter S, with all other parameters are as per Table 4.1; 
see Algorithm 4 at Appendix A.3, and Algorithm 5 at Appendix A.4. This provides the necessary 
difference for each algorithm to differentiate between S uniforms, with constructed linearly spaced 
means. 

We observe here the same elimination trend as was discussed previously for the triangular distri¬ 
bution: the VaR elimination algorithm liberally eliminating arms, whereas the CVaR elimination 
algorithm retains arms for further consideration, for longer periods. In striking contrast, it would 
appear that two separate number of arms are under consideration in Figures 4.5 and 4.6, however, 
this not the case. As the convergence for the uniform is the fastest of our three examples, we 
observe a rapid elimination of arms in both algorithms, and near instantly in the case of the VaR 
elimination algorithm. Relatively few observations are required for this algorithm to discard arms 
that are non-optimal. What we observe in the limit of this execution is the arm with the largest g(-) 
value (i.e., the highest line). 

As with both of our previous cases, the CVaR elimination algorithm was unable to successfully 
converge in the given number of iterations, with 3/25 arms remaining in contention. As will be seen 
in the following section, the elimination of the majority of non-optimal arms occurs quite quickly; 
however, convergence to the optimal with the last few remaining arms is where the majority amount 
of time is spent for each algorithm. Figure 4.5 is a depiction of the standard setting for 500 iterations, 
with an underlying uniform distribution. The value of the y -axis is the quantile associated with 
the threshold C. When juxtaposed to Figure 4.6, we note that while the elimination example and 
convergence properties are similar, however, the y -axis has a strikingly different scale. In this 
instance, we are dealing with the basic quantile setting and as such, the y -axis represents the raw 
value, X st 




CVaR Elimination for the Uniform Distribution. 
Figure 4.5. Implementation of Algorithm 5. 
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CVaR Elimination for the Uniform Distribution. 
Figure 4.6. Implementation of Algorithm 4. 












4.2 Extended Length Implementation 

An investigation of the effect of increasing the number of allowed iterations, as well as the number 
of samples in each iteration, was undertaken. The aim of this analysis was to seek to understand 
the algorithmic behavior in the numerical limit of the algorithms. 

4.2.1 Long-Run Trials 

To explore algorithm performance with a greater number of iterations, each of the CVaR elim¬ 
ination algorithms was executed with only a single modification from the standard parameters: 
the maximum number of allowed iterations, being 2,500. While convergence to a single opti¬ 
mal arm was again not observed, we can note that fewer arms are remaining for consideration at 
this point. This clearly indicates the more extreme case of the logarithmic expected number of 
iterations derived in the previous chapter. While no official timing was undertaken, the time for 
each algorithm to execute 500 iterations was approximately 0.75 hours, whereas the time taken to 
execute 2,500 iterations was approximately 41 hours: a super-linear increase in the time required 
for each subsequent iteration to complete. At the completion of this algorithm for each distribution, 
25,000,000 samples had been used to create the data in figures 4.7(a) to 4.7(c). From our derivation 
in the previous chapter, it has been calculated that approximately 5,800,000 more samples for each 
distribution would be required to reach full convergence to the optimal arm(s), with a probability 
of at least 1-6, where 6 = 0.1. This estimation is given from the derivation of Theorem 2. 
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2500 iterations of n = 10 4 samples for each iteration. Truncated Normal (a) - left and Triangular (b) - right. 


83 

8 5 
M £ 


■s 2 . 



Algorithm Iteration (count) 


Uniform distribution (c). 

Figure 4.7. Implementation of Algorithm 5 for Multiple Distributions. 


4.2.2 High Sample Trials 

In contrast to the previous section, the total number of iterations has not changed for this analysis, 
however, the number of additional observations in each iteration has. By increasing the number 
of observations from 10 4 to 10 5 sampled at each iteration, we note that both CVaR elimination 
algorithms now converge. Due to the stochastic nature of the underlying root finding problem, 
figure 4.8(b) depicts a convergence in only 18 iterations, as opposed to more than 250 in figure 
4.8(c). We do observe that in each case the algorithm stopping criteria is met and promptly exits 
from any further iterations. This is the expected behavior. Further to this, we increased the number 
of arms under consideration, depicted in 4.8(a), detailing the convergence of the CVaR elimination 
algorithm in a mere 160 iterations for the triangular distribution. The number of arms under 
consideration has been increased to 100: a four-fold increase from our other numerical examples. 
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Additionally, we observe that for this replication, a non-optimal arm is selected as optimal as the 
algorithm stops. This is the 100<5 percentage of cases where the optimal arm will not be selected 
and occurs with a probability of 6. This again illustrates the stochastic nature of the algorithm and 
the potential for incorrect selection of the optimal arm. 




0 2 4 6 8 10 12 14 16 18 


Algorithm Iteration (count) 


n = 10 5 per iteration for each iteration. 


Triangular distribution, (a) - left and (b) - right. 



Uniform distribution (c). 

Figure 4.8. Implementation of Algorithm 5 for Multiple Distributions. 


4.3 High Dimensional Data 

Contained in Figure 4.9 is a classic limitation on scalability, here specifically of our implemented 
superquantile sequential elimination algorithm. As a result of the development of the resulting 
large and non-sparse matrix, we observe our code gradually utilizing greater resources, until 
either imposed or physical limits are reached. This is an important consideration for future work, 
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particularly for implementation on live database systems; see Chapter 5 for a further discussion on 
this. 


81 

81.1000 

81.2000 

81.3000 

Out of memory. Type HELP MEMORY for your options. 

Error in VaR_truncnorm Qine 48) 

x(s, l:(i + 1) * obs) = sort(y(s, l:(i + 1) * obs), 'descend'); 


Memory error for n = 10 5 observations with 1,000 iterations, occurring at iteration 813 
(a matrix of size 81, 300,000 x 100). 

Figure 4.9. Memory Error on the Hamming Architecture. 


4.4 Algorithm Verification 

Contained in Figure 3.1 from the previous chapter we observe the true solution to the stochastic 
root finding problem posed, shown as the blue line. In Figure 4.10, we have superimposed the 
empirical estimate for g(-) onto this plot, depicted as the orange line. This depicts the solution of 
g(-) as a sensitivity analysis for various values of k over the interval [-100,100]. In this instance, 
n = 100 points were evaluated to identify potential errors within our root finding function: the core 
of our two algorithms. It is clear that even over a large domain such as is presented, the empirical 
estimate for g(-) is tolerable, within bounds. Of observational note is the conservative nature of 
the empirical solution, where both the tail decay and algorithm peak do not have the solution range 
of the true root solution. The total function range is lower in the maximum value and higher in 
the minimum value for our empirical g (■) when compared to Equation 4.2. Figure 4.10 shows the 
difference between the true root solution given in Equation 4.2 and our empirical solution for 100 
linearly spaced data points. 
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Figure 4.10. Empirical Estimate of g(-) Versus the Root Equation Solution 
for Values of k e -100,..., 100, Where n = 100, C - 25, n - 15, and 
cr = 30. 


4.5 Convergence of Epsilon 

The behavior of the threshold parameter e, as a function of the number of algorithm iterations, is 
depicted in Figures 4.11 and 4.12. We consider the truncated normal case, for which e is the largest 
among the three distributions considered, and hence is the worst case. In other words, having e 
relatively large results in a potentially slower rate of sequential elimination than would otherwise 
be seen from both the triangular and uniform distributions. 

In comparing directly the VaR elimination algorithm and CVaR elimination algorithm, e shown in 
Figure 4.11, it is observed that both algorithms display the same monotonic decreasing behavior, 
albeit with different magnitudes at each iteration. The long-run behavior is shown in Figure 4.12, 
which further illustrates the properties of the parameter e described. As the number of iterations 
gets large, e approaches 0, ensuring that selection the optimal arm(s) occurs. This has only been 
executed for the superquantile elimination algorithm and as such, no comparison exists as with 
Figure 4.11. 
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Log of Epsilon 




Comparison of VaR and CVaR for the parameter e. Numerical depiction of the parameter e over 2,500 iterations. 

Figure 4.11. Numerical Convergence of e. Figure 4.12. Long-run Numerical Convergence of e. 
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CHAPTER 5: 
Concluding Remarks 


In this final chapter, we discuss open issues and future work in relation to the problem presented in 
this work, both in a critical manner that addresses limitations, as well in the direction forward to 
support further development in this field to address intelligence processing issues. 

We have two main recommendations: first, upon the development of stable and scalable implemen¬ 
tations of the algorithms, testing must occur on known data sets with expected outcomes. This is 
key in verifying and validating outputs on live data, vice the numerical guarantees provided within 
this thesis. Second, and more importantly from a long-term standpoint, the elimination algorithms 
should be implemented within an intelligence organization to enhance the analysis capability: this 
work is a force multiplier. 

There exist great opportunities for future work as a result of this thesis, both in the applied and 
theoretical realms. Two main areas have been identified for future work in the applied domain. The 
first is extending the algorithm to work on real data sets, such as stock portfolio data. This will verify 
and validate the theoretical performance on known data, whilst negating the immediate requirement 
to parallelize the algorithm. The second identified applied area is regarding parallelization and 
scaling of the algorithm to handle large and non-sparse data sets. This is of critical importance in 
any real-world application. 

Three theoretical opportunities were identified for continuing this work. The first aligns closely 
with the parallelization advancement discussed above, in which for very large matrices on the 
order of 10 9 elements, the current storage of every observation is not practical. Work needs to be 
undertaken to review when it is appropriate to remove observations that are not required and replace 
them with a tuple of data containing index positions, summary statistics and weightings. This 
elimination of additional data points will significantly reduce the runtime and storage requirements. 
The second area for improvement is regarding the proof resulting in the parameters if/ and 
While quite conservative in their currently implemented form, we observe that optimization of 
each parameter is possible for various distributions, as well empirical data. These parameters are 
critical to improving the runtime of each algorithm. Related to this, work to extend the underlying 
distributions to an unbounded domain case must occur, as our work so far requires a bounded 
domain distribution assumption. The removal of this requirement will allow observations from 
distributions with infinite domains, such as the classic normal distribution. 

We have studied a resource allocation problem in an intelligence setting, attempting to enhance 
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efficiency within the first two stages of the intelligence cycle, and thus improving the quality 
of the intelligence items that are to be considered by analysts. We created two algorithms to 
find the source(s) that produce the largest fraction of relevant items with respect to a request for 
information. More generally, this thesis presented a new approach to identifying the arm(s) with the 
largest or smallest VaR or CVaR risk, under a loss constraint. This problem is not only important 
in intelligence applications, but in marketing and finance, as discussed. Some readers may note 
that definitive conclusions are not presented within this work—this is entirely intentional—as the 
further work mentioned within this chapter will be required for a critical body of mass to be achieved 
in this research endeavour. Our contribution has set the conditions for further advancements to be 
made. 
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APPENDIX. Mathematical Proofs and Algorithm Code 


A.l Proof of Theorem 1 

We suppose that k n < k(C ) - e, for e > 0. Given the case k n > k(C ) + e, then, from (3.8), 
1 n 

0 < - V (X, - C) I (X t > k n ) 

n £—i 

1 = 1 

1 n \ n 

= - V (Xi - C) i (kn <Xi<k (C)) + - V (Xi - C) i (Xi > k (C)) 

n n 

i =l i =l 

- n \ n 

< - V (X,- - C) I (k (C) - e < Xi < * (C)) + - V (X/ - C) / (X f > k (C)), 

n n 

i=l (=l 

as Xj < C on the event {X/ < k(C) - e}. It follows that, since C > k(C), 

1 n 

-Y(Xi-c)i(Xi>k(C)) 

n *—1 

1 = 1 

1 U 

> — V (X f - C) I (k (C) - e < Xi <k( O) 

77 

i= 1 

1 " 

>(C-k (C)) -Yl(k(C)-e< Xi < k (C)). 

77 Z—J 


Then, it must hold that 


p (kn < k(C) - 6) 


/1 n J n 

- Z (Xf “ C)/(X ' > fc(C)) - (C - k{C)) - Z /(fc(C) - 6 < ^ W)) 


< exp -2/7 


(C - k(C))P(k(C) -e<Xi< k(C )) 
b - a 


and by Hoeffding’s Lemma. Hence, by Assumption A3 and Lemma 1 below, 


P (k n < k(C ) - e) < exp(-2mj/ 2 e 2 £ 2 / (b - a) 2 ). 


(A.l) 
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In proving the other direction, if k„ > k(C ) + e then, Equation 3.8 results in 

1 n i n 

- Y(Xi - C)I(X' > k(c ) + 6) < o < - YiXi - c)i(Xi > k n ), 

n ^ n 

;=1 7=1 

where this covers the third possibility for the root k n discussed in Chapter 3. 

Also, since E[(X - C)I(X > k(C ))] = 0, 

E[(X-C)I(X > k(C) + e )] 

= E[(C - X)I(k(C) < X < k(C) + e)] 

(A.2) 

> (C - k{C))P{k{C) < X < k{C ) + e) 

>^et r, 

by Assumption A3 and Lemma 1. It then follows that 


P ( k n > k(C ) + 6) 


<P^ ~ C)1(Xi > k(Q + 6) < 0 ) 

= P\E[(X - C)I(X > k(C ) + 6)] - - Yxx, 

1 n 

< P(E[(X - C)I(X > kiC) + e)] - - YjiXi 


i= 1 


C)I(Xi > k{C) + f) > £[(X - C)I(X > k(C) + 6)] 
C)l(Xi > k(C ) + 6) > 


(A.3) 


by (A.2) and Hoeffding’s Lemma. In summary, we see that 


- k(C )| > e) < 2 exp -2n 


b - a) 


Lrom here, the results are input into the sequential elimination approach of [26], in order to obtain 
the elimination algorithm, as will be shown. Lor 0 < 6 < 1 selected by the agent, set 


P(\k n -k(C) | > €„) < 


2exp (- 2,, (^)) = 


6 6 


n 2 n 2 S 


Solving for e n . 


e n = log 


_2„2 i 


n~irS\ 1 \ 1 ^’ b - a 


36 In W 
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Thus, for any n = 1,2,..and e„ as given above, 


P ( \k s , n - k s (C )| > e„) < 


_6d_ 
n 2 n 2 S ’ 


so, we therefore obtain that 


Z p <i* s,n k s (C )| > 6„) < 
n~\ 



6 

7T 2 I7 2 


8 

S’ 


and due to Basel’s problem. 



7r 


2 


6 ' 


Hence, 


^ (U„ vS |/:,^ - MOI > 6;;) < 

s,n 



It follows that, 


P (\k s ,„ - k s (C)\ < e n ,Vn,Vs = l,...,S) >1-8. 


Lemma 1 Setting iJj to 


<A = 


b-Cb-C r 
2 2 => 


1-^ 


> 0 


satisfies C - k(C ) > ijf. 
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Proof of Lemma 1: We argue that 


E[X \X > C -if/]>C 


which implies that C - k(C) > if/. Indeed, 

E[XI(X >C-if/)] 


E[X \ X > C -if/] = 


P[X >C- if/] 

E[XI(C-if/ <X <C + ^)] E[XI(X>C+^)] 


P[X >C- if/] 


+ 


P[X > C - if/] 


E[(C - if/)I(C -if/ <X <C + ^)] E[(C + ^h)I{X >C+ ^)] 


b-c^ 


b-C ^ 


> 


P[X > C - if/] 


+ 


P[X > C - if/] 


P\c-ifj <X <C + ^] 

= (C - if/) ---— + (C + 1 —-) 

v t / nrv . v r\ / 


b-C P\X > C + t£] 


P[X >C- if/] 


2 P[X > C-ifz] 


I P[X>C + ^]\ b-C P[X>C + ^] 

(C-■/») !- „ , t(Ct—)■ 


P[X > C - if/] 


2 P\X > C-if/] 


, b-C \ b-C b-C 

> (C - If/) 11 - P[X > C + ] I + (C + — 2 ~)P[X >c+ —j~] 

, b-C \ b-C b-C 

> (C - if/) 11 - P[X > C + —] I + (C + —)(fc - C - —)£ 

, b-C \ b-C b-C 

> (C - if/) 11 - —n + (C + —)(—K 

, fc-C \ b-Cb-C 

>c-if/ 1-——A + — -— 


We must ensure that ip is small enough so that the right hand side is at least C. By inspection, 


<A < 


b-C b-C y 
2 2 j 

1-w 


which completes the proof. 
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A.2 Proof of Theorem 2 

Equation 3.16 leads to, 


P(a n -a>e)< P(F{k n ) - F(k(C)) > qe) + P(P(k(C)) - F(k(C )) > (1 - q)e), (A.4) 

for 0 < q < 1 and e > 0. For the first term 

P(F(k n ) - F{k{ C)) > qe) = p[- £ Hk(C) < X t < k n ) > qe\ . 


i= 1 


If (l/n) £ I(k(C) < Xj < k n ) > qe then 


1 = 1 


j n 1 ^ 

- V(x f - c)/(x f > k n ) - - Y(Xi - c)i(Xi > k(C)) 

n n 

i=i /=l 

1 n 

= -Y(c-x i )i(k(C)<x i <k n ) 

n x—L 


i—1 

> (C - k n )qe. 

Since 0 < (1 /n) £ (X l - C)I(X l > k n ) < (b - C)/n, for 0 < £ < <A, 

i= 1 

pl [ lp(HO<X i <k n )>qe^ 

< P ^ Yj( c ~ Xi)I(Xi > k(C)) > (C - k n )qe - (b - C)/n j 

= P Z (C - X < )I(X ‘ > ^ C )) > ( C - *«)?* - (*> - C)/n; - fc(C) > 

+ p(^ Yj(C - Xi)I(Xi > k(C) j > (C - k n )qe - (b - C)/n; k n - k(C) < f) 

< P (K - k(C) > f) + g(C - Xi)I(Xi > k(C)) > («A - %)qe ~ (b - C)/nj, 
by Assumption A3 and Lemma 1. Hence, 

„,/v» N w / o I O ((i/s -Z)qc- (,b-C)/n\ 2 \ 

P(F(k n ) -F(k(C)) > qe) < expl-2n I—-I 1 + exp I —2/71 -- I j, 
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and again by Equation A.3 and Hoeffding’s Lemma, for n > {b - C)/((ip - £)qe). Subsequently, 
regarding the second term in Equation A.4, 


P(F(k(C)) - F(k(C )) > (1 - q)e) < exp(-2n(l - q) 2 e 2 ). 


Thus, in summary we observe that 


P(a n - a > e) 

==exp(- 2 „(|^) )+exp(- 2 „ 


o -f;)qe- (b-C)/n\ ! 2 ^ 

--- + exp(-2«(l -q) e ) 

b — a 


< 3 exp I —2/7 I min < ——, 


. \ (ip - Qqe - (b- C)/n 

nn / - - 


b - a 


, (1 ~q)e 


Whilst unoptimised in this work i; = iA/2 and q - 1/2 (note: an optimisation of these parameters 
could occur in future work), so that 


P(a n - a > e) < 3 exp -In 


min 




if/e /2 - (b - C)/n 


2 (b - a)' 2 (b - a) 


e/2 


for n > 2 (b - C)/( if/e). The analysis of P(a n < a - e) is similar and results in an identical 
exponential bound; the proof is omitted for the sake of brevity. The conclusion we obtain is that, 


P(\a„ 


a 


> e) < 6 exp -2 n min 


<A V 


ifse/2 - (b - C)/n 


2 (b - a)' 2 (b - a) 



for n > 2 (b - C)/(if/e). As in the proof of Theorem 1, for 0 < 6 < 1 chosen by the agent, e n is set 
so that 


6 exp 



|min 


<A 2 e»£ tAe„/2 - (b - C)/n V \~\ 66 

2 (b - a)' 2 (b-a) n }) J ~ n 2 n 2 S' 


which leads to, 


€n 


log 


n 2 n 2 S 


2\ 1 ^“ f b-a 2(b - a) + (b - C)/n) 
n j \ i/r 2 ^ ip 



By standard arguments, as in the proof of Theorem 1, it follows that, 


p (\a s ,n - ar s (C)| < e n , Vn, Vs = 1,..., S) > 1 - 6. 
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A.3 MATLAB Implementation of Algorithm 4 

function [result , epsilon, verbose_arms , mu, optimal_arm , root_max , 
expected_samples ] . . . 

= VaR_truncnorm(C, sigma, a, b, S, delta , obs , max_iter) 

% 

% Adam J Hep worth 
% Naval Postgraduate School 

% 

verbose_arms = true(l, S); 
mu = linspace(a + 1, C - 1, S); 
y = zeros ( size (mu, 2),(max_iter + 1) * obs); 
x = zeros ( size (mu, 2), (max_iter + 1) * obs); 
result = zeros ( size (mu, 2), max_iter + 1); 
epsilon = zeros (1, max_iter + 1); 

i = 0; 

ze t a = min ( min ( normpdf (( a - mu) / sigma ) , normpdf ((b - mu) / sigma)) . . . 

./ (sigma * (normcdf((b - mu)/sigma) - normcdf((a - mu) / sigma)))) ; 
for s = 1: siz e (mu, 2) 

root_true(s) = fzero(@(k) sigma * (normpdf((k - mu( s))/sigma).. . 

- normpdf ((b - mu( s ))/sigma)) + mu(s) . * (normcdf((b - mu( s ))/sigma) 

- normcdf (( k - mu( s)) /sigma)) - C * ( normcdf (( b - mu( s )) / sigma ) . . . 

- normcdf((k - mu( s ))/sigma)) , [a, C]); 

end 

psi = min(C - root_true); 

while (sum( double ( verbose_arms )) > 1) && (i < max_iter) 
for s = 1: siz e (mu, 2) 

y(s, i * obs + (l:obs)) = random ( tru nc at e ( makedist .. . 

(’Normal’, mu(s), sigma), a, b) , [1, obs]); 
x(s, 1: ( i + 1) * obs) = sort(y(s, 1:(i + 1) * obs), ’descend’); 

root_eval = size ( find (cumsum(x( s , 1: ( i + 1) * obs) - C) > 0), 2); 

if (root_eval > 0) 

result(s, i + 1) = x(s, root_eval); 

else 

result(s, i + 1) = a; 

end 

end 

epsilon (1 + i) = sqrt((.5/(obs * (i + 1)))... 

* log((pi A 2 * (obs * (i + 1)) A 2 * S)/(3 * delta)))... 

* (b - a)/(zeta * psi); 

root_max (i + 1) = max( re s u 11 (: , i + 1)); 

optimal_arm (i + 1) = find ( r e s u 11 (: , i + 1) == root_max (i + 1)); 

for s = 1: siz e (mu, 2) 
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if (verbose_arms(s) == true) && ( double ( size ( verbose_arms (1: s ) , 2)) 
~= optimal_arm (i + 1)) 

if abs (root_max (i + 1) - result(s, i + 1)) > (2 * epsilon(l + i)) 
r e s u 11 (s , i + 1) = NaN ; 
verbose_arms ( s ) = false; 

end 

e 1 s e i f ( verbose_arms ( s ) == false) 
result(s, i + 1) = NaN; 

end 

end 

expected_samples (i + 1) = ((8*(b - a) A 2) / (psi A 2 * zeta A 2))... 

* log((pi A 2 * S) / (3 * delta))... 

* (( double ( size ( verbose_arms (1: s ) , 2) - 1)... 

* (4 * epsilon (i + l)) A (-2))); 
disp (i /max_iter*100) 

i = i + 1; 

end 

save ( ’ VaR_truncnorm_data . mat ’ ) ; 
end 

% 

% end of program 

% 

Listing 1: Implementation of the Sequential Quantile Elimination Algorithm 
for the Truncated Normal Distribution 
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A.4 MATLAB Implementation of Algorithm 5 

function [result , epsilon, verbose_arms , mu, optimal_arm , root_max , 
expected_samples ] . . . 

= CVaR_truncnorm(C, sigma, a, b, S, delta , obs , max_iter) 

% 

% Adam J Hep worth 
% Naval Postgraduate School 

% 

verbose_arms = true(l, S); 
mu = linspace(a + 1, C - 1, S); 
y = zeros ( size (mu, 2),(max_iter + 1) * obs); 
x = zeros ( size (mu, 2), (max_iter + 1) * obs); 
result = zeros ( size (mu, 2), max_iter + 1); 
epsilon = zeros (1, max_iter + 1); 

i = 0; 

ze t a = min ( min ( normpdf (( a - mu) / sigma ) , normpdf ((b - mu) / sigma)) . . . 

./ (sigma * (normcdf((b - mu)/sigma) - normcdf((a - mu) / sigma)))) ; 
for s = 1: siz e (mu, 2) 

root_true(s) = fzero(@(k) sigma * (normpdf((k - mu( s))/sigma).. . 

- normpdf ((b - mu( s ))/sigma)) + mu(s) . * (normcdf((b - mu( s ))/sigma) 

- normcdf (( k - mu( s)) /sigma)) - C * ( normcdf (( b - mu( s )) / sigma ) . . . 

- normcdf((k - mu( s ))/sigma)) , [a, C]); 

end 

psi = min(C - root_true); 

while (sum( double ( verbose_arms )) > 1) && (i < max_iter) 
for s = 1: siz e (mu, 2) 

y(s, i * obs + (l:obs)) = random ( tru nc at e ( makedist .. . 

(’Normal’, mu(s), sigma), a, b) , [1, obs]); 
x(s, 1: (i + 1) * obs) = sort(y(s, 1:(i + 1) * obs), ’descend’); 
root_eval = size ( find (cumsum(x( s , 1: ( i + 1) * obs) - C) > 0), 2); 
if (root_eval > 0) 

result(s, i + 1) = ((( i + 1) * obs) - root_eval) / ((obs * (i + 1) 

)); 

else 

result(s, i + 1) = 0; 

end 

end 

epsilon (1 + i) = real(sqrt((2 / (obs * (i + 1)))... 

* log((pi A 2 * (obs * (i + 1)) A 2 * S) / delta)).. . 

* max ([((b - a)/(zeta * psi A 2)); ((2 * (b - a) + (b - C)... 

/(obs * (i + 1)))/psi); 1])); 

root_max (i + 1) = max( re s u 11 (: , i + 1)); 

optimal_arm (i + 1) = find ( r e s u 11 (: , i + 1) == root_max(i + 1)); 
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for s = 1: siz e (mu, 2) 

if ( verbose_arms ( s ) == true) && ( double ( size ( verbose_arms (1: s ) , 2)) ~ 
optimal_arm (i + 1)) 

if abs (root_max (i + 1) - result(s, i + 1)) > (2 * epsilon(l + i)) 
r e s u 11 (s , i + 1) = NaN ; 
verbose_arms ( s ) = false; 

end 

e 1 s e i f ( verbose_arms ( s ) == false) 
result(s, i + 1) = NaN; 

end 

end 

expected_samples (i + 1) = 32 * (max([((b - a)/(zeta * psi A 2)) ;... 

((2 * (b - a) + (b - C)/(obs * (i + l)))/psi); 1])) A 2 ... 

* log((pi A 2 * S) / (3 * delta))... 

* (( double ( size (verbose_arms (1: s) , 2) - 1)... 

* (4 * epsilon (i + l)) A (-2))); 
disp (i / max_iter*100) 

i = i + 1; 

end 

save ( ’ CVaR_truncnorm_data . mat ’ ) ; 
end 

% 

°lc end of program 

% 

Listing 2: Implementation of the Sequential Superquantile Elimination 
Algorithm for the Truncated Normal Distribution 
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